Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres

نویسندگان

Sarah Callaghan

Steve Donegan

Sam Pepler

Mark Thorley

Nathan Cunningham

Peter Kirsch

Linda Ault

Patrick Bell

Rod Bowie

Adam M. Leadbetter

Roy K. Lowry

Gwenaëlle Moncoiffé

Kate Harrison

Ben Smith-Haddon

Anita Weatherby

Dan Wright

چکیده

The NERC Science Information Strategy Data Citation and Publication project aims to develop and formalise a method for formally citing and publishing the datasets stored in its environmental data centres. It is believed that this will act as an incentive for scientists, who often invest a great deal of effort in creating datasets, to submit their data to a suitable data repository where it can properly be archived and curated. Data citation and publication will also provide a mechanism for data producers to receive credit for their work, thereby encouraging them to share their data more freely. International Journal of Digital Curation (2012), 7(1), 107–113. http://dx.doi.org/10.2218/ijdc.v7i1.218 The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ 108 Making Data a First Class Output doi:10.2218/ijdc.v7i1.218 Introduction Through much of scientific history data has been a scarce resource, requiring significant efforts to obtain, but by contrast datasets were generally smaller and more easy to share in hard copy format, as either tables, pictures or graphs. As scientists’ ability to collect more and increasingly detailed data has increased, their ability to publish it easily has decreased. Given that the currency of academic credit is based around the journal publication, and the historic difficulties associated with publishing data, it is not surprising that a scientific culture has arisen where data sharing is viewed with a variety of opinions from enthusiasm to skepticism or outright hostility. Knowledge is power, and in an increasingly competitive market for research funding sole possession of a significant dataset might be a key factor in ensuring continued funding. The benefits of sharing data are many, including the ability to discover and reuse data which has already been collected, thus avoiding redundant data collection and saving time and money; and providing opportunities for collaboration. For this reason, research funders are keen to encourage data sharing. The tension on the researchers’ side is that there is (currently) no universally accepted mechanism for data creators to obtain academic credit for their dataset creation efforts. Consequently, they often prefer to hold the data until they have extracted all the possible publication value they can. Though completely understandable, this behaviour comes at a cost for the wider scientific community. A tension therefore exists between the need to share data to encourage reuse and collaboration, whilst still ensuring that the shared data is of good scientific quality and is suitable for reuse. In parallel to this is the data creator’s need for attribution and credit, whilst they balance the reputational risks associated with sharing (including the discovery of errors in the data, increased opportunity for collaboration) versus the benefits of not sharing (such as maximising publications and research funding). This paper details the work done by the NERC Science Information Strategy Project on Data Citation and Publication, and attempts to put the concepts of data citation and publication into the context of work done by the NERC-funded research community. The project is being run as a collaboration of the NERC environmental data centres, who wish to encourage researchers to deposit data in the archives where it can be curated and managed properly. Data citation and publication is being proposed as an incentive for researchers to do just this, and thereby avoiding the situation humorously outlined in Brown (2010). “Publishing” Versus “publishing” It is now possible to “publish” data relatively easily; at its most basic all a researcher has to do is to stick the files on a website somewhere. This makes the data open, but without any form of long-term commitment. There are no guarantees that the data will still be there in six months, or that the files won’t get corrupted. Furthermore, it is possible that a scientist who isn’t the data creator won’t be able understand the contents or even open the files at all. Even if the dataset is readable and has sufficient metadata, there is no information about the scientific quality of the dataset, other than that attached to the creator’s reputation. The International Journal of Digital Curation Volume 7, Issue 1 | 2012 doi:10.2218/ijdc.v7i1.218 Sarah Callaghan et al. 109 By contrast, a formal “Publishing” process adds value to the dataset for the future consumers of the data. This may be by providing an indication of the scientific quality and importance of the dataset (as measured through a process of peer-review), or by ensuring that the dataset is complete, frozen, and has enough supporting metadata and other information to allow it to be used by others in the years to come. “Publishing” implies a commitment to persistence of the data. It also provides a mechanism for allowing data producers to obtain academic credit for their work in creating the datasets. The notion of formally “Published” data does not necessarily imply that the data would be open, but there is no reason why “Published” data should not be open. Figure 1 gives a schematic example of this. There have been many discussions held about closed versus open data, and there will be many more in the future. What is generally well agreed is that it is no longer appropriate to keep significant datasets stored on a single hard drive, or several CDs in a drawer in an office somewhere. The recent Climategate scandal showed that the general public do indeed have an interest in the work that their taxes are funding. The UK government also wish to make all data from publicly funded research available to the public for free. Figure 1. The tension between open and closed publication and Publication. (DOIs are digital object identifiers) To a scientist, there is little benefit from making their dataset available as a free download from a webpage, unless they work in certain areas of science where this is expected. In fact, the reputational risk of doing so (particularly if others find errors, or The International Journal of Digital Curation Volume 7, Issue 1 | 2012 110 Making Data a First Class Output doi:10.2218/ijdc.v7i1.218 worse, take advantage of the dataset to earn new research funding) and the extra work involved in doing so, might mean that the scientist would prefer to store the data on a closed server. Data centres are working with scientists to bring data from the closed servers and CDs into an archive where they can be properly curated, with the eventual aim of publication and the dataset author receiving full academic credit for their efforts. Data Citation and Publication and the NERC Environment Data Centres It is commonly accepted that data curation is a difficult job, and most data producing scientists have neither the time nor the inclination to focus on it. It is for this reason that NERC funds six data centres, which between them have responsibility for the long-term management of NERC’s environmental data holdings. NERC researchers are expected to liaise with these data centres to determine how and what portions of their data should be archived and curated for the long term, and then work with data centre staff to ingest the dataset, together with its accompanying metadata and documentation, into the archives. NERC are also keen to obtain good value from the research they fund and so have set up the Science Information Strategy (SIS) to provide the framework for NERC to work more closely and effectively with its scientific communities in delivering data and information management services. The NERC SIS data citation and publication project aims to create a way of promoting access to data, while simultaneously providing the data creators with full academic credit for their efforts. The project also aims to implement a process to ensure the technical and scientific quality of the resulting datasets. To achieve this, we are developing a mechanism for the formal citation of datasets held in the NERC data centres, and are working with academic journal publishers to develop a method for the scientific peer-review and publication of datasets. The first step in this project is to formalise a method for citing datasets, and to encourage the NERC scientific community to use it as standard when discussing datasets in the literature. Citation of Data Using Digital Object Identifiers (DOIs) Anyone can reference a dataset stored on the internet by using an appropriate form of words, plus a URL linking to the page where the dataset can be found. However, URLs are renowned for breaking, and so do not deliver the stability that one expects for a formal citation. It is for this reason that we have decided to use Digital Object Identifiers (DOIs) to signify datasets that are complete, in a useable format, stable (changes are implemented by publication of new versions), have valid metadata, have passed the quality control checks within the domain of expertise of the data centre, and have long-term stewardship guaranteed by that data centre, underwritten by the ICSU World Data System. This provides the basis for a dataset to be cited as if it were a research paper, putting it on a par with other scientific outputs. The International Journal of Digital Curation Volume 7, Issue 1 | 2012 doi:10.2218/ijdc.v7i1.218 Sarah Callaghan et al. 111 The NERC data centres are not the only groups to use DOIs for citing datasets. For example, in the Earth Sciences, the Pangaea data archive cite their datasets using DOIs, and the ISIS pulsed neutron and muon source issues DOIs to their experiments. Scientists are already used to citing papers using DOIs, so it is only a small change to their behaviour to get them to cite data in the same way. Using DOIs for data also allows us to piggy-back on various pre-existing citation metrics, without having to invent new ones. At the time of writing, the data citation project has successfully assigned DOIs to 14 datasets held in the NERC environmental data centres. We are still in a testing phase, and guidelines for what constitutes a dataset suitable for DOI assignment are in development. At the moment, all the DOIs that have been assigned are to completed, legacy datasets in the archive. We anticipate that in the near future, dataset authors will be creating datasets with the aim of getting a DOI for them when they’re completed. Permission to assign a DOI (should the dataset meet the criteria) will be sought from the data authors as part of the creation of the data management plan. The technical criteria for DOI assignment will also be presented to the dataset author at this stage, allowing them to ensure that their data meets the criteria. The DOI assignment account has been issued to NERC by the British Library, acting on behalf of DataCite, as part of a pilot project by DataCite. NERC are not the only DOI-issuer for data in the UK. Other participants include the Archaeology Data Service, the UK Data Archive and the Bodleian Library at the University of Oxford. The methods proposed by the NERC data centres for the citation of their datasets could as easily be applied by any other data repositories, provided they met the DataCite criteria for being DOI minters. It is anticipated that a data citation (with DOI) will be of value to the authors of the dataset, even if they never then go through the scientific peer-review process associated with journal publication. A helpful analogy would be to consider a dataset in a data centre (without DOI) as equivalent to grey literature, a dataset with a DOI citation as a paper published in conference proceedings, and a dataset published in a formal data journal as equivalent to an academic journal article.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تحلیل استنادی و ترسیم نقشه علمی تولیدات پژوهشگران ایرانی در زمینه سلول‌های بنیادی نمایه شده در پایگاه نمایه استنادی علوم تا سال 2015

Introduction: For new scientific field to achieve more coherent position, it needs scientometric analysis more than other fields. This study aims to use citation analysis and scientific mapping of Iranian researchers' publications in stem cell indexed in science citation database up to 2015 Methods: This is an applied descriptive study using scientific mapping method. The population of this ...

متن کامل

Estimation the amount of currency outflow arising from publishing articles in foreign journals and providing strategies for resistance economy

Background and aim: The macro-policies of resistance economy emphasize reducing foreign dependencies and preventing unnecessary currency outflows. The purpose of this study was to calculate the amount of currency outflow resulting from the publication of articles in foreign journals and provide the necessary strategies in the field of research. Materials and methods: The present study was a mix...

متن کامل

The publication status and general quality of internationally published articles by Iranian nursing scholars

Background and Purpose: One of the most reliable methods to evaluate the scientific status of nursing is the assessment of the trend and quality of related articles. This study aimed to determine the publication status and general quality of articles published by Iranian nursing scholars engaged in different nursing and midwifery schools in well-known international journals during 2000-2011. Me...

متن کامل

Citation Review and Scientific Visualization of Articles Published in the Iranian Rehabilitation Journal (IRJ) 2003-2023 in the Scopus Database

Objective: Accurate scientific planning and societal macro policies require reviewing and evaluating research output. Scientometrics offers a valuable approach for assessing the activity of journals that publish a majority of scientific productions. This study aims to analyze the scientific activity of the Iranian Rehabilitation Journal (IRJ) by examining its publication history in the Scopus d...

متن کامل

Investigating the Effect of Spatial Proximity on Iran University- Industry Co-publications by using Gravity Model

Background and Aim: Due to the importance of scientific relations between university and industry, it is so important to identify the factors that affect these relations. So,the aim of this study is to investigate the effect of spatial proximity on university- industry collaboration. The collaboration indicator which is used here is University- Industry Co-publications. Methods: The research is...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJDC

دوره 7 شماره

صفحات -

تاریخ انتشار 2012

Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres

نویسندگان

چکیده

منابع مشابه

تحلیل استنادی و ترسیم نقشه علمی تولیدات پژوهشگران ایرانی در زمینه سلول‌های بنیادی نمایه شده در پایگاه نمایه استنادی علوم تا سال 2015

Estimation the amount of currency outflow arising from publishing articles in foreign journals and providing strategies for resistance economy

The publication status and general quality of internationally published articles by Iranian nursing scholars

Citation Review and Scientific Visualization of Articles Published in the Iranian Rehabilitation Journal (IRJ) 2003-2023 in the Scopus Database

Investigating the Effect of Spatial Proximity on Iran University- Industry Co-publications by using Gravity Model

عنوان ژورنال:

اشتراک گذاری